STAR-Net: A SpaTial Attention Residue Network for Scene Text Recognition
نویسندگان
چکیده
In this paper, we present a novel SpaTial Attention Residue Network (STAR-Net) for recognising scene texts. Our STAR-Net is equipped with a spatial attention mechanism which employs a spatial transformer to remove the distortions of texts in natural images. This allows the subsequent feature extractor to focus on the rectified text region without being sidetracked by the distortions. Our STAR-Net also exploits residue convolutional blocks to build a very deep feature extractor, which is essential to the successful extraction of discriminative text features for this fine grained recognition task. Combining the spatial attention mechanism with the residue convolutional blocks, our STAR-Net is the deepest end-to-end trainable neural network for scene text recognition. Experiments have been conducted on five public benchmark datasets. Experimental results show that our STAR-Net can achieve a performance comparable to state-of-the-art methods for scene texts with little distortions, and outperform these methods for scene texts with considerable distortions.
منابع مشابه
Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition
In this paper, we present a Character-Aware Neural Network (Char-Net) for recognizing distorted scene text. Our CharNet is composed of a word-level encoder, a character-level encoder, and a LSTM-based decoder. Unlike previous work which employed a global spatial transformer network to rectify the entire distorted text image, we take an approach of detecting and rectifying individual characters....
متن کاملSEE: Towards Semi-Supervised End-to-End Scene Text Recognition
Detecting and recognizing text in natural scene images is a challenging, yet not completely solved task. In recent years several new systems that try to solve at least one of the two sub-tasks (text detection and text recognition) have been proposed. In this paper we present SEE, a step towards semi-supervised neural networks for scene text detection and recognition, that can be optimized end-t...
متن کاملSTN-OCR: A single Neural Network for Text Detection and Text Recognition
Detecting and recognizing text in natural scene images is a challenging, yet not completely solved task. In recent years several new systems that try to solve at least one of the two sub-tasks (text detection and text recognition) have been proposed. In this paper we present STN-OCR, a step towards semi-supervised neural networks for scene text recognition that can be optimized end-to-end. In c...
متن کاملA Neural Network for Spatial Relations: Connecting Visual Scenes to Linguistic Descriptions
This paper presents a system based on neural networks that can analyse spatial relations in a visual scene and connect them to appropriate linguistic descriptions. The system learns spatial concepts like “right of” and “above” by viewing a visual scene containing a number of objects and simultaneously receiving a text string describing the scene. The spatial relations between the objects in the...
متن کاملReading Scene Text with Attention Convolutional Sequence Modeling
Reading text in the wild is a challenging task in the field of computer vision. Existing approaches mainly adopted Connectionist Temporal Classification (CTC) or Attention models based on Recurrent Neural Network (RNN), which is computationally expensive and hard to train. In this paper, we present an end-to-end Attention Convolutional Network for scene text recognition. Firstly, instead of RNN...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016